Approximation Algorithms for Min-Sum k-Clustering and Balanced k-Median

نویسندگان

  • Babak Behsaz
  • Zachary Friggstad
  • Mohammad R. Salavatipour
  • Rohit Sivakumar
چکیده

We consider two closely related fundamental clustering problems in this paper. In the min-sum k-clustering one is given a metric space and has to partition the points into k clusterswhile minimizing the sum of pairwise distances between the points within the clusters. In theBalanced k-Median problem the instance is the same and one has to obtain a clustering into kcluster C1, . . . , Ck, where each cluster Ci has a center ci, while minimizing the total assignmentcosts for the points in the metric; here the cost of assigning a point j to a cluster Ci is equal to|Ci| times the j, cj distance in the metric.In this paper, we present an O(log n)-approximation for both these problems where n isthe number of points in the metric that are to be served. This is an improvement over theO( −1 log n)-approximation (for any constant > 0) obtained by Bartal, Charikar, and Raz[STOC ’01]. We also obtain a quasi-PTAS for Balanced k-Median in metrics with constantdoubling dimension.As in the work of Bartal et al., our approximation for general metrics uses embeddingsinto tree metrics. The main technical contribution in this paper is an O(1)-approximation forBalanced k-Median in hierarchically separated trees (HSTs). Our improvement comes from amore direct dynamic programming approach that heavily exploits properties of standard HSTs.In this way, we avoid the reduction to special types of HSTs that were considered by Bartal etal., thereby avoiding an additional O( −1 log n) loss. ∗Email: [email protected].†Email: [email protected].‡Supported by NSERC. Email: [email protected].§Email: [email protected].

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sublinear-Time Approximation for Clustering Via Random Sampling

In this paper we present a novel analysis of a random sampling approach for three clustering problems in metric spaces: k-median, min-sum k-clustering, and balanced k-median. For all these problems we consider the following simple sampling scheme: select a small sample set of points uniformly at random from V and then run some approximation algorithm on this sample set to compute an approximati...

متن کامل

Finding Low Error Clusterings

A common approach for solving clustering problems is to design algorithms to approximately optimize various objective functions (e.g., k-means or min-sum) defined in terms of some given pairwise distance or similarity information. However, in many learning motivated clustering applications (such as clustering proteins by function) there is some unknown target clustering; in such cases the pairw...

متن کامل

Approximate clustering without the approximation

Approximation algorithms for clustering points in metric spaces is a flourishing area of research, with much research effort spent on getting a better understanding of the approximation guarantees possible for many objective functions such as k-median, k-means, and min-sum clustering. This quest for better approximation algorithms is further fueled by the implicit hope that these better approxi...

متن کامل

A Survey on Exact and Approximation Algorithms for Clustering

Given a set of point P in Rd, a clustering problem is to partition P into k subsets {P1, P2, · · · , Pk} in such a way that a given objective function is minimized. The most studied cost functions for a cluster, μ(Pi), are maximum or average radius of Pi, maximum diameter of Pi, and maximum width of Pi. The overall objective function is ⊕ μ(Pi), where ⊕ is typically the Lp-norm operator. The mo...

متن کامل

Distributed Balanced Clustering via Mapping Coresets

Large-scale clustering of data points in metric spaces is an important problem in mining big data sets. For many applications, we face explicit or implicit size constraints for each cluster which leads to the problem of clustering under capacity constraints or the “balanced clustering” problem. Although the balanced clustering problem has been widely studied, developing a theoretically sound di...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015